CS 294 - 1 Assignment 2 Report

نویسندگان

  • Cheng Lu
  • Yun Jin
چکیده

In this report, we describe our implementation of a linear regression method to classify a numerically-scored sentiment data. The dataset was collected by Mark Dredze and others at Johns Hopkins, which records 1M amazon.com book review. The linear regression classification starts with reading tokenized data and building word counts map, and then training linear classifier by minimizing error term. We conduct a 10-fold cross validation to evaluate our method. We will describe our implementation of each of these steps in addition to enhancements such as stop words removing.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

CS 294-1: Assignment 1 Naive Bayes Classification with Improvements

The main objective of this assignment was to implement a Naive Bayes classifier and attempt certain improvements upon the vanilla version. A major challenge was to implement the classifier in Scala using the two libraries scalala and scalanlp. This report presents details regarding the different experiments I tried out, namely varying the smoothing parameter, feature selection, n-gram models an...

متن کامل

CS 294-1: Assignment 2 A Large-Scale Linear Regression Sentiment Model

The primary objective of this assignment was to build a linear regression sentiment model based on amazon.com reviews. The main challenge comprised of handling moderately large amounts of data on a single machine. The different variations that I tried include the following: exact solution (L2 loss and ridge regularization), stochastic gradient with different training schemes and initialization,...

متن کامل

CS 294 - 1 Assignment 1 Report

Text classification has increasing potential applications in many aspects of information world, such as recommender systems and customer service. The goal of this assignment is to apply Naive Bayes classifier to a data set of labeled textual movie reviews and practice Scala/ScalaNLP. The data set “Polarity dataset v2.0” is from http://www.cs.cornell.edu/People/pabo/movie-reviewdata/, created by...

متن کامل

U . C . Berkeley Handout N 10 CS 294 : Pseudorandomness and Combinatorial Constructions

Today we will study some conditions under which a very powerful pseudorandom generator can be shown to exist, and also some consequences of the existence of such a pseudorandom generator. We will start by assuming the existence of a permutation p : {0, 1}n → {0, 1}n which is computable in poly(n) time and which, for some constant δ > 0, is (2δn, 2−δn)-one way. (This is an extremely strong assum...

متن کامل

CS 294 - 1 A 1 : Naive Bayesian Classifier

Settings. Our codes were written in Scala and compiled under Simple Build Tool (SBT). The programs were run on Mac OS. We test the effectiveness of our implementation in various aspects. If not mentioned explicitly, we adopt the following default settings. We report macroaveraged F1 measures, which were further averaged by ten-fold cross validations. We consider both “Bernoulli” and “Multinomia...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012